A Hierarchical Bayesian Approach for Semi-supervised Discriminative Language Modeling
نویسندگان
چکیده
Discriminative language modeling provides a mechanism for differentiating between competing word hypotheses, which are usually ignored in traditional maximum likelihood estimation of N-gram language models. Discriminative language modeling usually requires manual transcription which can be costly and slow to obtain. On the other hand, there are vast amount of untranscribed speech data on which offline adaptation technique can be applied to generate pseudo-truth transcription as an approximation to manual transcription. Viewing manual and pseudo-truth transcriptions as two domains, we perform hierarchical Bayesian domain adaptation on discriminative language models sharing a common prior model. Domain-specific and prior models are estimated jointly using training data. In the N-best list rescoring experiment, hierarchical Bayesian domain adaptation has yielded better recognition performance than the model trained only on manual transcription, and seems robust against inferior prior.
منابع مشابه
Risk-Based Semi-Supervised Discriminative Language Modeling for Broadcast Transcription
This paper describes a new method for semi-supervised discriminative language modeling, which is designed to improve the robustness of a discriminative language model (LM) obtained from manually transcribed (labeled) data. The discriminative LM is implemented as a log-linear model, which employs a set of linguistic features derived from word or phoneme sequences. The proposed semi-supervised di...
متن کاملPerformance Comparison of Training Algorithms for Semi-Supervised Discriminative Language Modeling
Discriminative language modeling (DLM) has been shown to improve the accuracy of automatic speech recognition (ASR) systems, but it requires large amounts of both acoustic and text data for training. One way to overcome this is to use simulated hypotheses instead of real hypotheses for training, which is called semisupervised training. In this study, we compare six different perceptron algorith...
متن کاملA Bayesian Model for Generative Transition-based Dependency Parsing
We propose a simple, scalable, fully generative model for transition-based dependency parsing with high accuracy. The model, parameterized by Hierarchical Pitman-Yor Processes, overcomes the limitations of previous generative models by allowing fast and accurate inference. We propose an efficient decoding algorithm based on particle filtering that can adapt the beam size to the uncertainty in t...
متن کاملUnsupervised training methods for discriminative language modeling
Discriminative language modeling (DLM) aims to choose the most accurate word sequence by reranking the alternatives output by the automatic speech recognizer (ASR). The conventional (supervised) way of training a DLM requires a large amount of acoustic recordings together with their manual reference transcriptions. These transcriptions are used to determine the target ranks of the ASR outputs, ...
متن کاملGenerative and Discriminative Learning in Semantic Role Labeling for Italian
In this paper, we present a Semantic Role Labeling tool for Italian language for the FLaIT competition at Evalita 2011. This tool presents an hybrid approach to resolve the different sub-tasks that composed the SRL task. We apply a discriminative model for the boundary detection task based on lexical and syntactical features. A distributional approach to modeling lexical semantic information, i...
متن کامل